🎮 Reinforcement Learning - jcbush · Scour

Goal-Conditioned Reinforcement Learning from Sub-Optimal Data on Metric Spaces

arxiv.org·14h

📊Optimization

check out this article on Reinforcement Learning with R: Origins, Real-Life Applications, and Practical Implementation

dev.to·2d·

Discuss: DEV

📊Statistical Computing

Mitigating Reward Hacking in RLHF via Bayesian Non-negative Reward Modeling

arxiv.org·14h

🎲Bayesian statistics

Show HN: Fighting the War Against Expensive Reinforcement Learning

cadenza-landing-qtu7gbjwb-akshparekh123-3457s-projects.vercel.app·12h·

Discuss: Hacker News

🤖Machine learning

Optimizing post-disaster road restoration with reinforcement learning: A traveler-behavior-aware approach

sciencedirect.com·3h

📊Optimization

A Conceptual Framework for Exploration Hacking

lesswrong.com·3h

🎲Bayesian statistics

A training principle for drifting models

breno.bearblog.dev·8h

🤖Machine learning

Recursive self-improvement from AI models

marginalrevolution.com·2d·

Discuss: Hacker News

📊Optimization

Generalized Lanczos method for systematic optimization of neural-network quantum states

link.aps.org·9h

📊Optimization

Learning Optimization Tools

trendhunter.com·2d

📊Optimization

How to Leverage Explainable AI for Better Business Decisions

towardsdatascience.com·4h

🤖Machine learning

Robotics Motion Learning: Training Linked Robot Arms with Kuramoto Models

hackernoon.com·1d

🤖Machine learning

Researchers propose a self-distillation fix for ‘catastrophic forgetting’ in LLMs

infoworld.com·9h

📊Optimization

EyesOff: Why Some Models Quantize Better Than Others

ym2132.github.io·20h·

Discuss: Hacker News

🤖Machine learning

ashworks1706/rlhf-from-scratch: A theoretical and practical deep dive into Reinforcement Learning with Human Feedback and it’s applications in Large Language Models from scratch.

github.com·2d·

Discuss: Hacker News

📊Statistical Computing

A masterclass in AI security operations

redcanary.com·5h

🤖Machine learning

In defense of wasting time

fastcompany.com·18m

🧘Digital Minimalism

Feedback Control for Computer Systems

janert.org·12h

📊Statistical Computing

AI Beyond The Chatbot: The New Value Chain

seekingalpha.com

·6h

🤖Machine learning

A multi-agent reinforcement learning approach to autonomous aircraft taxiing with taxiing time, fuel consumption, and emission optimization

sciencedirect.com·1d

📊Optimization

Loading more...